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(57) Abstract 



A sprite-based coding system includes an encoder and decoder where sprite-building is automatic and segmentation of the sprite 
object is automatic and integrated into the sprite building as well as the coding process. The sprite object is distinguished from the rest 
of the video objects on basis of its motion. The sprite object moves according to the dominant component of the scene motion, which is 
usually due to camera motion or zoom. Hence, the sprite-based coding system utilizes dominant motion, to distinguish background images 
from foreground images. The sprite-based coding system is easily integrated into a video object-based coding framework such as MPEG-4 
where shape and texture of individual video objects are coded separately. The automatic segmentation integrated in the sprite-based coding 
system identifies the shape and texture of the sprite object. 
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DESCRIPTION 



SPRITE-BASED VIDEO CODING SYSTEM 

5 



Field of the Invention 

10 This invention relates to a mechanism by which a sprite (also called mosaic) is built 
automatically both in an encoder and a decoder, operating in a separate shape/texture 
coding environment such as MPEG-4. We also discuss applications that will utilize this 
technology. 

1 5 Background of the Invention 

A mosaic image (the terms mosaic and sprite will be used interchangeably) is built from 
images of a certain scene object over several video frames. For instance, a mosaic of the 
background scene in case of a panning camera will result in a panoramic image of the 
20 background. 

In MPEG-4 standardization activities, two major types of sprites and sprite-based coding 
are defined. The first type is called off-line static sprite. An off-line static sprite is a 
panoramic image which is used to produce a sequence of snapshots of the same video 
25 object (such as background). Each individual snapshot is generated by simply warping 
portions of the mosaic content and copying it to the video buffer where the current video 
frame is being reconstructed. Static sprites are built off-line and are transmitted as side 
information. 

30 The second type of mosaic is called on-line dynamic sprite. On-line dynamic sprites are 
used in predictive coding of a video object. A prediction of each snapshot of the video 
object in a sequence is obtained by warping a section of the dynamic sprite. The residual 
signal is coded and used to update the mosaic in the encoder and the decoder 
concurrently. The content of a dynamic mosaic may be constantly updated to include the 

35 latest video object information. As opposed to static sprites, dynamic sprites are built on 
line simultaneously in the encoder and decoder. Consequendy, no additional information 
needs to be transmitted. 

Summary of the Invention 

40 

We have described a syntax for MPEG-4 which provided a unified syntax [2] for off-line 
static sprite and on-line dynamic sprite-based coding. Our syntax also allows new modes 
that we refer as "dynamic off-line sprite-based coding," where predictive coding is 
performed on the basis of an off-line sprite (as opposed to directly copying the warped 
45 sprite as in the case of off-line static sprites), and ''on-line static sprite-based coding," 
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where the encoder and the decoder stop building the sprite further, and use it as a static 
sprite whether it is partially or fully completed. 

Both off-line static and on-line dynamic sprite-based coding require constructing a sprite. 

5 In the former case, the sprite is built prior to transmission. In the later case, the spate is 
built on-line during the transmission. So far, MPEG-4 has assumed that the outline 
(segmentation) of the object for which the sprite is going to be built is known a-pnon at 
every time instant. Although this is true in certain applications, especially in post- 
production- or content generation using blue screen techniques, automatic segmentation 

10 should be an integral part of sprite building in general. There is therefore a need for sprite- 
based coding systems where sprite building does not require a-prion knowledge of scene 
segmentation. 

In this disclosure, we describe a sprite-based coding system (encoder and decoder) where 
15 sprite-bunding is automatic and segmentation of the sprite object is automatic and 
integrated into the sprite building as well as the coding process. 

We assume that the sprite object can be distinguished from the rest of the video objects on 
basis of its motion. We assume that the sprite object moves according to the dominant 
20 component of the scene motion, which is usually due to camera motion or zoom. Hence, 
our system utilizes dominant motion, which is known to those of skill m the art. 

Our system is suitable for a video object-based coding framework such as MPEG-4 [3], 
where shape and texture of individual video objects are coded separately. The automatic 
25 segmentation integrated in the described system identifies the shape and texture of the 
sprite object. 

There are several possible applications of the invention: In very low bit rate applications, 
coding of video frames in terms of video objects within may be expensive, because the 
30 shape of such objects may consume a significant portion of the limited bit budget. In such 
cases our system can fallback to frame-based coding where automauc segmentauon is 
only used to obtain better dominant motion estimation for sprite building and dominant 
motion-compensated prediction, as described in the "Operations" section, later herein. 

35 The described coding system has features which make it suitable for applications where 
camera view may change frequently, such as video conferencing with muluple cameras, or 
a talk show captured with more than one camera. Our system may be applied to building 
multiple sprites and using them as needed. For instance, if the camera goes back and forth 
between two participants in front of two different backgrounds, two background sprites 

40 are built and used as appropriate. More specifically, when back ground A is visible, 
building of the sprite for background B and its use in coding is suspended until 
Background B appears again. The use of multiple sprites in this fashion is possible within 
the MPEG-4 framework, as will be described in the "Operations" section. 
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The disclosed system generates a sprite during the encoding process as will be described 
later herein. However, the resulting sprite may be subsequently used, after coding, as a 
representative image of the compressed video clip. Its features can be used to identify the 
features of the video clip which can then be used in feature-based (or content-based) 
5 storage and retrieval of video clip. Hence sprite-based coding provides a natural fit to 
populating a video library of bitstreams where sprite images generated during the encoding 
process act as representative images of the video clips. Indeed the mosaics can also be 
coded using a still image coding method. Such a video library system is depicted in Fig. 5. 

10 In a similar fashion, one or several event lists may be associated with a background sprite. 
A possible choice for an event list is the set of consecutive positions of one or several 
vertices belonging to each foreground objects. Such a list can then be used to generate 
token representative image of the foreground object position in the sprite. Consecutive 
positions of each vertex could either be linked by a straight line or could share a distinct 

15 color. The consecutive positions of the vertex may be shown statically (all successive 
positions in the same sprite) or dynamically (vertex positions shown in the mosaic 
successively in time). A vertex here can be chosen to correspond to any distinctive feature 
of the foreground object, such as the center of gravity or a salient point in the shape of the 
object. In the latter case, and if several vertices are used simultaneously, the vertices might 

20 be arranged according to a hierarchical description of the object shape. With this 
technique, a user or a presentation interface has the freedom to chose between coarse to 
finer shapes to show successive foreground object positions in the background sprite. This 
concept may be used in a video library system to retrieve content based on motion 
characteristics of the foreground. 

25 

The automatic sprite building portion of the described system may be used in an off-line 
mode in a video conferencing application where the off-line sprite is built prior to 
transmission. Depiction of such a system is shown in Fig. 6. The described system, can 
also generate a sprite that has a higher spatial resolution than the original images. 

30 

Brief Descriptions of the Drawings 

Fig. 1 illustrates the steps used in the method of the invention at time t-1. 
35 Fig. 2 illustrates the steps used in the method of the invention at time t to t+L 
Fig. 3 illustrates the steps used in the method of the invention at time t+1 to t+2. 
Fig. 4 is a block diagram of the method of the invention. 

40 

Fig. 5 is a block diagram of the system of the invention. 
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Fig. 6 depicts the system and method of the invention as used in a video conferencing 
system. 

Fig. 7 depicts how consecutive portions of a foreground object may be represented in a 
mosaic according to the invention. 
5 Detailed Description of the Preferred Embodiments 

The described method is designed to progressively learn to <^™*^f *^ 
background while building a background mosaic at the same time Steps 1 to 10 are 
0 re^ed^ntil the construction of the background is complete or until it is aborted. 

Assumptions 

The notations are as follows: 
5 l(s,t) denotes the content of video frame at spatial position 5 and at time t . 

W (lis t - 1)} denotes a warping operator which maps the image at time (t-1) to time 
, gWen pixel location , 0 in a video buffer at time t, tins warping opetution is 

20 performed by copying the pixel value at corresponding location s_m frame (t-l).The 
^^LSZa^oa , 0 and location * is established by a particular and well 
defined transformation such as an affine or a perspective transformauon. 

Sfc t) is an indicator buffer, say for quantity x, which can be either 1 or 2 bits deep for all 
25 spatial location 5. 

Thresh is a threshold value. The operations <Thresh and > Thresh are symbolic and can 
represent complex thresholding operations. 

30 The size (per color component) of the current image frame l{s, t) is M, x N, and the size 
of the previous compressed/decompressed frame after warping, 
W t ^ t . n (c- l c{l(sj-l)}), is such that it can be inscribed in a rectangular array of 
Af,_ t xN,_i pixels. . 

35 The sprite M{s,t)is an image intensity (texture) buffer of size M m xN mP er color 
component. The field S^W* a sin ^ component field of the same size. 

The construction of the sprite is started at time t. The image l(s,t - l)has already been 
compressed and decompressed and it is available in both the encoder and the decoder. 
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In the following steps, the image content is assumed to have a background and a 
foreground part (or VO) and a mosaic of the background is built 

Step 1: Initialization. 

5 

Referring now to Figs. l-3 f the results of the steps of the method described in the previous 
section are depicted. Fig. 1 illustrates steps 0 through 11 from time t-1, the instant when 
mosaic building is initiated, to time t when the a new video frame or field has been 
acquired. Figs. 2 and 3 illustrate steps 2 through 11 from time t to t+1 and from time t+1 

10 to t+2, respectively. At the top left corner in each of these figures (A) is shown the newly 
acquired video frame which is compared to the previous video frame (next image field to 
the right)(B) once it has been compressed/de-compressed and warped (step 2). Step 3 is 
illustrated by the rightmost image field (C) in the first row of each figure. This field shows 
the area where content change has been detected. The status of the mosaic buffer is 

15 shown in the leftmost image field in the second row (D). This buffer is used to identify the 
new background areas as described in step 4. These areas correspond to regions where 
background was not known until now. Foreground identification is illustrated by the 
rightmost image in the second row (F). The operations associated with this image are 
described in step 5 which use the change map, the mosaic and the new background areas 

20 to define the foreground. Steps 6 and 7 of the method are illustrated by the two leftmost 
image fields in the third row (G, H). Here, background information comes from the 
compressed/decompressed foreground information obtained in the previous step. Finally , 
the mosaic updating process is illustrated by the bottom right image field (I). This process 
takes place in steps 8,9,10 and 1 1 of the method. 

25 

The binary field 3 mosaie (s f t) is initialized to 0 for every position s in the buffer, meaning 
that the content of the mosaic is unknown at these locations. 

The content of the mosaic buffer Af(s,t) is initialized to 0. 

30 

The warping parameters from the current video frame l(s,t-l) to the mosaic is 
initialized to be W, 0Ml _ 0 ( ), tO here representing an arbitrary fictive time. This initial 

warping is important as it provides a way to specify the "resolution" or the t4 time 
reference" used to build the mosaic. Potential applications of this initial mapping are 
35 making a mosaic with super spatial resolution or selection of an optimal time tO 
minimizing the distortion introduced by the method. These initial warping parameters are 
transmitted to the decoder. 

Step 2: Acquisition. 

40 

The image l(s f t) is acquired and the forward warping parameters for mapping the image 
JCz^-1) to /(;£,r)are computed. The number of warping parameters as well as the 
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15 



20 



method for estimating these parameters are not specified here. A dominant motion 
SuTafion algorithm such as the one given in [4] may be used. The warpmg parame*rs 
are composed with the current warping parameters, resulting in the mapping W,^ )■ 
These parameters are transmitted to the decoder. 

Step 3: Detect Change in Content Between Previously CodedlDecoded Frame and 
Current Frame. 

i) Initialization of a large buffer of size M b xN b greater than the image 
( M b > M, , N„ > N, ) and possibly as large as the mosaic. The buffer is 2 bits deep at 
everV location. The buffer is initialized to 3 to indicate unknown status. 

ii) Compute ( motion compensated ) scene changes over common image support. Giye 
tabd 0 to all -locations where change in content is deemed small. Give label la to 
fcations where change is detected to be large. To make regions ^o^o^ 
implement additional operations (e.g. morphological operations ) which «^ 
label from la to 0 or set label from 0 to la. Regions labeled. 0 will typicaUy be 
cartdeTd and coded as part of the background Video Object while regions labeled la 
will typically be encoded as part of the foreground Video Object. 



tkt)-W M {c*c{l(*t-l)}]* Thres t 



la otherwise 

25 where Thres clvmge denotes a pre-defined threshold value. 

iii) Tag new image regions, where support of image at time t does not overlap with 
support of image at time (t-1), as 

30 3^,(^0 = ^ 

Step 4: Identify New Background Areas. 

A new background area is detected if there has not been any change in image content in 
35 the last two video frames. The corresponding area in the mosaic must also mdicafc daat the 
background at this location is unknown. The resulting new background area is then 
pZ£ to any neighboring regions where background is known. As wUl be seen m. later 
Lps, incorporation of new background data into the mosaic must be done a^ordmg to 
compressed/de-compressed background shape informauon to avoid any drift between 
40 encoder and decoder. 
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.1 if ((3 e ^t0==o)& & (^wo(3^fc'-l))==o)) 



0 otherwise 



Here, the indicator value 0 means that the background is unknown. 

5 

Step 5: Perform Foreground/Background Segmentation. 

First look at regions where the background is known (3^^ - 1) = 1 ). Perform 
thresholding to distinguish the foreground from the background (case (i)). For regions 
10 where background is not known, tag as foreground any regions where changes have 
occurred (label la and lb defined in step 3) (cases (iii) and (iv) ). 

Case (ii) represents new background areas which are excluded from being part of the 
foreground, 

15 i) If ^ 0 (3 moW (x,r-l))==l 



= Lx if \l(£S)-W g ^(M(s t t-l))\>Thresh /g 
= 0 otherwise 



where Thres /g is a pre-defined threshold value which is used here to segment foreground 
20 from background. 

ii) else if 5 nbg (sj) = = 1 
3 A (f,f) = 0 

25 

iii) else if ((3^ fcr - 1) = = o) & & (s^fcr) « la)) 
$ /s (s,t)=la 

30 iv) else((3 mo ^ 



BNSDOCID: <WO 9829834A1_I_> 



WO 98/29834 



PCT/JP97/04814 



10 



15 



20 



Tn fiifi and (iv) a sub-classification of the regions tagged as 1 into either regions la 
LTJl^rt^le purpose of providing the encoder wi* the ~^ t T 
macroblock selection rates. For example, regions tagged as la tnignt re 
SS.'SdS to-frame macroblocks since these regions occur over common 
SSTS^oTS other hand, regions tagged as lb might P*^*«« 
SrSe macroblocks since these regions do not share a common support mm me 
previous frame. 

Step 6: Compress/Decompress Foreground Shape and Texture. 

»» ^ nvPtl rtnml a P or B-VOP) prediction mode to encode foreground regions labeled 
^ la Td ib 1 1 of P or B-VOPs, individual macroblocks can either « . in». 
to paction or intra-frame coding. The pixels corresponding »W£SF*£ 

^^^^^ 

SSS S^KWib. encoder ^decod. to upd f the content of the 
mosaic. This process can be performed using the MPEG-4 VM 5.0 [3]. 

Step 7: Get Background Shape. 

Get background shape from compressed/de-compressed foreground shape 
S5SS)e^oSip««ion is necessary here to ensure that encoder and decoder share 
the same shape information. 



25 3 it fe0 = 



35 



1 if C- 1 C{3 A M M °} 
0 otherwise 



where <T l c{ }denotes shape coding/decoding which for instance can be performed as 
described in [3]. 
30 Step 8: Initialize New Background Texture in Mosaic. 

Identify regions where new background has occurred and initialize mosaic with content 
found in previous video frame (time (M)). Note that the field 2 nb! (s,t) cannot be used 
here since this information is unknown to the decoder. 



AfV-!) = 



M{s,t-l) if (3^(^-1)==!) 
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Step 9: Calculate Background Texture Residuals From Mosaic Prediction, 

^b t (&0 " 1 * calculate difference signal by using mosaic content as predictor: The 

resulting Al(s,t) is used to compute the difference signal over the entire macroblock 
where the pixel (s,t) is located. This difference signal is compared to conventional 
difference signals produced by using prediction from the previous and the next video 
frame (P or B prediction mode). The macroblock type is selected according to the best 
prediction mode. The residual signal is transmitted to the decoder along with the 
compressed background shape as described in [2]. 



A/fc/) = r(i,t) - w /w0 (m V - 1)) 

Step 10: .Update Background Shape in Mosaic. 
15 Update mosaic map to include shape of new background. 



i '/ s^C^-i) == i 

i '/ (K-,KM)==i) && (s^&r-i^o)) 



0 otherwise 



Step 11: Update Mosaic. 

20 

Update content of the mosaic in regions corresponding to new or non-covered 
background in frame t 

M(s,t)= [l - aW^X^U*))] Af , ( £ ,f-l) + 

aW f0w (3 6 , k,t))[M'(s 9 t - 1) + ^ow(c" l c{A/fcr)})] 

25 

The selection of the value of the blending parameter a (0 < a < 1) in the above equation 
is application dependent. 

The method described above builds the mosaic with reference -to time tQ, which can be a 
30 time instant in the past or can be the current time or a future time instant It is 
straightforward to rewrite the above equations for the case where the mosaic is 
continuously warped to the current time instant L 

Turning now to Fig. 4, a block diagram of the method is depicted. The purpose of this 
35 drawing is to highlight the dependencies in the various components and quantities used by 
the method of the invention. It also emphasizes the various warping and un-warping 
stages that are necessary to align consecutive video fields. 
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Fig. 5 shows a block diagram of the digital video database system that uses the method of 
the current invention. 

5 Rg 6 shows a block diagram of a video conferencing system that use an off-line built 
background sprite as dynamic sprite during transmission 

tr« 7 <h rtW < an examole of how consecutive positions of a foreground object (here a car) 
Flg 1 rlr^n^S a mosaic by plotting the successive positions of one or several 

STSlSr™ °hol statically m the mosaic and they capmre one level of shape 
description only. 

Operation of the Various Embodiments 
A Mosaic-Based Video Conferencing and Videophone System. 

Referring now to Figs. 5 and 6, the communicadon protocol can include a ^nfiguradon 

Z r e Sdescribed above is used to build the background mosaic. During normal video 
25 SS2n rmosic is used as a dynamic sprite and the blending factor » set to 0 to 
n^eTa^y updating. In this case, macroblock types may be dynamic or static. In one 
ext^meTase macroblocks are stauc-type macroblocks meaning that the background 
moTaTc is bfmg usTd as a static sprite. In another extreme case, all macroblocks are of qpe 
mosaic is Demg usea y (predictive) spate. This later case 

30 XZs a ht£r d ?^^ a mosaic of the backgroun 

sceTcan be buut before the transmission and then be used as a static or a dynamic sprite 
during normal transmission session. 

A Mosaic-Based Video Database 

35 The above method may be used in populating and searching a database of video 

barrels ?e a database of compressed bitstreams. In such a system, video dips are 
co^sed using the above method. The result is a compressed bitstream and a mosaic 
geS dul^the encoding process. The mosaic image can be used *s a - rep™v e 

40 £age of the video clip bitstream and its features can be used in mdexmg and retrieval of 
the bitstream belonging to that video clip. 

Furthermore motion trajectory of the foreground can be overlaid on top of mosaic to 
Sruserwto rough description of foreground motion in the sequence Trajectory of a 
45 toe^^objUt can 8 be represented by a set of points, each representing the position of a 
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particular feature of the foreground object at a given instant. The feature points can be 
salient vertices of the object shape. A hierarchical description of object shape would bring 
the additional advantage of allowing the database interface to overlay from coarse to fine 
shape outlines in the mosaic. Consecutive vertex positions can be shown together in the 
5 same background mosaic or could be displayed successively in time with the same mosaic 
support. Note that this idea provides the additional benefit of facilitating motion-based 
retrieval since the motion of the foreground is represented in mosaic reference space. 

Referring now to Fig. 7, the background mosaic is comprised of the grass, sky t sun and 
10 tree. The foreground object is a car subject to an accelerated motion and moving from left 
to right The shape of the car is shown in black. Eight vertices "V" have been selected to 
represent this shape. Fig. 7 shows that consecutive positions of the car can be represented 
in the mosaic by simply plotting the vertices at their successive positions. The color of the 
vertices is changed from tO to tO+1 and from tO+1 to tO+2 to avoid any possible 
15 confusion. In this example, the vertices are shown statically in the mosaic and they capture 
one level of shape description only. Finally, mosaic can be used as an icon. By clicking on 
the mosaic icon, user would trigger playback of the sequence. 

Support of Multiple Mosaics in Applications with Frequent Scene Changes. 

20 

In the case where the video sequence includes rapid and frequent changes from one scene 
to another, as may be the case in video conferencing applications, it is desirable to build 
two or more (depending on how many independent scenes there are) mosaics 
simultaneously. Having more than one mosaic does not force the system to re-initiate the 
25 building of a new mosaic each time a scene cut occurs. In this framework, a mosaic is used 
and updated only if the video frames being encoded share similar content. Note that more 
than one mosaic can be updated at a time since mosaics are allowed to overlap. 

Optimal Viewport 

30 

The arbitrary mapping W u . l)4 _, 0 ( ) used at the beginning of the method can be used to 

represent the optimal spatial representation domain for the mosaic where distortion and 
artifacts are minimized. While this is at this point an open problem that will require further 
study on our part, there is little doubt that the possibility exists to find an optimal mosaic 
35 representation where ambiguities (parallax problems) and/or distortion are minimized 
according to pre-defined criterion. 
Improved Resolution 

Likewise, the arbitrary mapping W (f . l)w0 ( ) can include a zooming factor which has the 
40 effect of building a mosaic whose resolution is potentially 2,3, or N times larger than the 
resolution of the video frames used to build it The arbitrary fixed zooming factor provides 
a mechanism by which fractional warping displacements across consecutive video frames 
are recorded as integer displacements in the mosaic. The larger the zooming factor, the 
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longer the sequence must be before the mosaic can be completed (more pixel locations to 
fill up). The MPEG-4 framework allows the implementation of such a scheme. 

We denote this arbitrary mapping W m () . In the linear case, this operator is the identity 
matrix multiplied by a constant scalar greater than 1. This scaling factor defines jte 
dement factor used for the mosaic. Tne mosaic update equation shown m step 1 1 can 
be re-written as follows. 

M(^) = [l-^>-(3 t| (4] + 

This equation shows that the mosaic is being built at the fixed time tO which can be the 
time 'responding to the first video frame, the time corresponding to the final frame or 
™ ototiL in between. In this case, the arbitrary mapping W m 0 . * always composed 
with the warping transformation W^. When the mosaic is continuously warped toward 
1 5 the current video frame, the update equation must be re-written as follows: 

M {s,t) = [ i - a {% (s,t))} WU.,, (ilf'fcr - 1) )+ 

a(3 if (s,t))[w^,_ n {M'& - 1)) + W m {c-c{M(s,t)})} 

The equation above shows that the arbitrary mapping W ra () is no longer composed with 
20 the frame-to-frame warping operator W lMrtl but instead applied to the compressed/de- 
compressed residuals. In MPEG-4, the arbitrary operator W ni O can be transmitted with 
appropriate extension of the syntax, as the first set of warping parameters ^l^fft 
supports only positioning the first video frame in the mosaic buffer via a translational shift. 

25 Coding of Video Sequences at Very Low Bit Rates. 

. In very low bit rate applications, the transmission of shape information may become an 
undesirable overhead. The method described above can still operate when transmission 
of shape information is turned off. This is accomplished by setting background shape 

30 to one at every pixel (step 7) and setting the blending factor a to 1 (step 1 1). The 
latter setting guarantees that the mosaic will always display the latest video 
information which is a necessity in this situation since foreground is included in the 
mosaic. In this situation, the macroblock types can be either intra, inter, stauc spate or 
dynamic sprite. The sprite is being used as a stauc sprite if all macroblocks are of type 

35 static This is the most likely situation for a very low bit rate application since no 

residual is transmitted in this case. The sprite is being used as a dynamic sprite if all 
macroblocks are of type dynamic. 
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CLAIMS 

1. A method of sprite-based predictive video coding (encoding and decoding) 
where sprite-building is automatic, and segmentation of the sprite object is automatic and 
integrated into the sprite building as well as the encoding and decoding processes, 
comprising: 

initializing a binary field to zero for every position in a buffer, 

acquiring the image and forwarding warping parameters for the image for 

mapping; 

detecting any change in content between a previously coded/decoded frame 
and the current frame; 

identifying new background areas; 
segmenting the foreground and background; 

preparing foreground shape and texture by compressing or decompressing 
the subject shapes; 

deriving the background shape from the previously prepared foreground 

shape; 

initializing the new background texture in mosaic; 

determine the background texture residuals from the mosaic prediction; 

updating the background shape mosaic; and 

updating the mosaic in all regions corresponding to new or non-covered 

background. 

2. A compressed video database system wherein sprites built during encoding 
are used as representative images of input video clips that can be analyzed and indexed for 
storage and retrieval purposes, comprising: 

a sprite-based encoder for receiving a video clip and generating a video 
bitstream and a mosaic; 

a feature extractor for extracting features from said mosaic and for 
identifying representative features; 

a video database generator for generating a video database from said 
representative features and said video bitstream; and 

a search engine for searching said video database for selected 
representative features. 
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